Demystifying MLOps with Vetiver

Myles Mitchell @ Jumping Rivers

Before we start…

Who am I?

  • Principal Data Scientist @ Jumping Rivers:

    • Project management.

    • Python & machine learning support for clients.

    • Teach courses in programming, SQL, ML.

    • Organise North East & Leeds data science meetups.

Talk plan

  • How I got into Data Science

  • First encounter with MLOps

  • Getting to grips using Vetiver (code examples)

  • MLOps tips & tricks

Some context…

Jumping Rivers

↗ jumpingrivers.com   𝕏 @jumping_uk

  • Machine learning
  • Dashboard development
  • R packages and APIs
  • Data pipelines
  • Code review
     

How I got into Data Science

My route into Data Science

  • PhD in Astrophysics (started 2017)

  • Extra training in “Data Intensive Science”

  • … academia is hard

  • Joined Jumping Rivers full time in 2022 (following an internship)

Life as a “Data Scientist”

My initial experience:

  • Software development (check out diffify.com)

  • Course writing and teaching

  • LOTS of merge requests

  • Conferences and meetups

My first encounter with MLOps

Typical data science workflow

  • Data is imported and tidied.
  • Cycle of data transformation, visualisation and modelling.
  • Results are communicated to an external audience.

From Classical Stats to Machine Learning

  • Classical statistical modelling prioritises understanding the system behind the data.
  • By contrast, machine learning tends to prioritise prediction.
  • As data grows we retrain our ML models to optimise predictive power.
  • A goal of MLOps is to streamline this cycle.

MLOps: Machine Learning Operations

  • Framework to continuously build, deploy and maintain ML models.
  • Encapsulates the “full stack” from data acquisition to model deployment.
  • Includes versioning, deployment and monitoring.
  • Sounds simple enough …

Reality

The dreaded architecture diagram…

Reality for an MLOps beginner

  • Countless permutations

    • Modelling frameworks
    • Cloud platforms
    • Environment and container managers
  • Very multidisciplinary

  • Expensive

  • Where to even begin..?

Getting to grips using Vetiver

MLOps frameworks

  • Amazon SageMaker
  • Google Cloud Platform
  • Kubeflow (ML toolkit for Kubernetes)
  • Vetiver by Posit (free to install, nice for beginners)
  • And the list goes on…

     

Vetiver

  • Integrates with popular ML libraries in R and Python.
  • Fluent tooling to version, deploy and monitor a trained model.
  • Deploy to a cloud service or to the localhost.

Let’s build an MLOps stack!

Data

  • Palmer Penguins dataset:

    library("palmerpenguins")
    
    names(penguins)
    [1] "species"           "island"            "bill_length_mm"   
    [4] "bill_depth_mm"     "flipper_length_mm" "body_mass_g"      
    [7] "sex"               "year"             
  • Let’s predict species using flipper length, body mass and island!

Scatter plot showing positive relationship between penguin flipper length and penguin body mass. The data points are coloured based on species and shaped based on island. The Gentoo penguins tend to have higher body mass and flipper length than Adelie and Chinstrap.

Palmer Penguin dataset

Data tidying

  • Using {tidyr} and {rsample}:

    # Drop missing data
    penguins_data = tidyr::drop_na(penguins)
    
    # Split into train and test sets
    penguins_split = rsample::initial_split(
      penguins_data, prop = 0.8
    )
    train_data = rsample::training(penguins_split)
    test_data = rsample::testing(penguins_split)

Modelling

  • Let’s set up the model recipe in {tidymodels}:
library("tidymodels")

model = recipe(
  species ~ island + flipper_length_mm + body_mass_g,
  data = train_data
) |>
  workflow(nearest_neighbor(mode = "classification")) |>
  fit(train_data)

Model testing

  • Our model object can now be used to predict species:
model_pred = predict(model, test_data)

# Accuracy for unseen test data
mean(
  model_pred$.pred_class == as.character(
    test_data$species
  )
)
[1] 0.9104478

Enter Vetiver!

  • Convert our {tidymodels} model to a {vetiver} model:

    v_model = vetiver::vetiver_model(
      model,
      model_name = "k-nn",
      description = "penguin-species"
    )
    v_model
    
    ── k-nn ─ <bundled_workflow> model for deployment 
    penguin-species using 3 features
  • Contains all the info needed to version, store and deploy our model!

Model versioning

  • Use {pins} to store R or Python objects for reuse later.

  • Store pins using “boards” including Posit Connect, Amazon S3 or even Google drive!

  • Storing in a temporary directory:

    model_board = pins::board_temp(
      versioned = TRUE
    )
    model_board |>
      vetiver::vetiver_pin_write(v_model)

Retrieving a model

  • Retrieve a model

    model_board |> vetiver::vetiver_pin_read("k-nn")
    
    ── k-nn ─ <bundled_workflow> model for deployment 
    penguin-species using 3 features
  • Inspect the stored versions

    model_board |> pins::pin_versions("k-nn")
    # A tibble: 1 × 3
      version                created             hash 
      <chr>                  <dttm>              <chr>
    1 20250930T144517Z-67749 2025-09-30 15:45:17 67749

Model deployment

  • We deploy models as APIs which take input data and send back model predictions.

  • APIs can be hosted at public endpoints on the web.

  • We can run them on the localhost (during testing / development).

  • {vetiver} uses {plumber} to create a model API.

Deploying locally

  • {vetiver} and {plumber} support local deployment:

    plumber::pr() |>
      vetiver::vetiver_api(v_model) |>
      plumber::pr_run()
  • Query the API via a simple dashboard or the command line.

  • Great for beginners to MLOps and APIs!

Deploying to Connect

  • Vetiver integrates nicely with Posit Connect:

    vetiver::vetiver_deploy_rsconnect(
      board = model_board, "k-nn"
    )
  • Easier / quicker if pinned model is on Connect.

  • We can also publish to Amazon SageMaker using vetiver_deploy_sagemaker()

Deploying to other cloud platforms

  • We start by preparing a Docker container:

    vetiver::vetiver_prepare_docker(
      model_board,
      "k-nn"
    )
  • This command:

    • Lists R depedencies with {renv}

    • Stores the {plumber} API code in plumber.R

    • Generates a Dockerfile

Dockerfiles

  • Our Dockerfile contains a series of commands to:

    • Install the system libraries (Windows|Mac|Linux).

    • Set the R version and install the required R packages.

    • Run the API in the deployment environment.

Model monitoring

Aside: What about Python?

  • Vetiver is available for both Python and R!

  • In Python you would use Python ML libraries rather than {tidymodels}

    • scikit learn
    • PyTorch
    • XGBoost
    • statsmodels
  • Vetiver documentation: vetiver.posit.co

MLOps tips & tricks

Modelling tips

  • Move from large CSVs to more efficient formats like Parquet and Arrow.
  • Version your data or SQL query commands.
  • Consider auto-ML tools like H2O.ai and SageMaker Autopilot for selecting models and features .

Deployment

  • Try deploying locally to check that your model API works as expected.
  • Use environment managers like {renv} to store model dependencies.
  • Structuring your code as an R package can help with organising your project dependencies and unit tests.

Cost considerations

  • Some cloud platforms offer free trials (e.g., SageMaker 2-month trial).
  • May be cheaper if you’re already invested in a particular cloud platform
    • Data services
    • App deployment
  • Costs can rise depending on computational resources consumed.
  • Model building and deployment use different environments!

Take home lessons

  • Life as a Data Scientist isn’t always about machine learning!

  • Architecture diagrams can be incredibly useful.

  • … but do consider your target audience!

  • You can get started on MLOps right now with free and open source tools.

  • Consider whether it is worth the cost/effort before investing in cloud infrastructure.

Thanks for listening!